Derivation of Context-free Stochastic L-grammar Rules for Promoter Sequence Modeling Using Support Vector Machine

نویسنده

  • Robertas Damaševičius
چکیده

Formal grammars can used for describing complex repeatable structures such as DNA sequences. In this paper, we describe the structural composition of DNA sequences using a context-free stochastic L-grammar. L-grammars are a special class of parallel grammars that can model the growth of living organisms, e.g. plant development, and model the morphology of a variety of organisms. We believe that parallel grammars also can be used for modeling genetic mechanisms and sequences such as promoters. Promoters are short regulatory DNA sequences located upstream of a gene. Detection of promoters in DNA sequences is important for successful gene prediction. Promoters can be recognized by certain patterns that are conserved within a species, but there are many exceptions which makes the promoter recognition a complex problem. We replace the problem of promoter recognition by induction of context-free stochastic L-grammar rules, which are later used for the structural analysis of promoter sequences. L-grammar rules are derived automatically from the drosophila and vertebrate promoter datasets using a genetic programming technique and their fitness is evaluated using a Support Vector Machine (SVM) classifier. The artificial promoter sequences generated using the derived Lgrammar rules are analyzed and compared with natural promoter sequences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Utilizing Target-Side Semantic Role Labels to Assist Hierarchical Phrase-based Machine Translation

In this paper we present a novel approach of utilizing Semantic Role Labeling (SRL) information to improve Hierarchical Phrasebased Machine Translation. We propose an algorithm to extract SRL-aware Synchronous Context-Free Grammar (SCFG) rules. Conventional Hiero-style SCFG rules will also be extracted in the same framework. Special conversion rules are applied to ensure that when SRL-aware SCF...

متن کامل

Computation of the Probability of the Best Derivation of an Initial Substring from a Stochastic Context-Free Grammar

Recently, Stochastic Context-Free Grammars have been considered important for use in Language Modeling for Automatic Speech Recognition tasks [6, 10]. In [6], Jelinek and Lafferty presented and solved the problem of computation of the probability of initial substring generation by using Stochastic Context-Free Grammars. This paper seeks to apply a Viterbi scheme to achieve the computation of th...

متن کامل

Recognizing Multitasked Activities using Stochastic Context-Free Grammar

In this paper, we present techniques for characterizing complex, multi-tasked activities that require both exemplars and models. Exemplars are used to represent object context, image features, and motion appearances to label domainspecific events. Then, by representing each event with a unique symbol, a sequence of interactions can be described as an ordered symbolic string. A model of stochast...

متن کامل

Max-Margin Parsing

We present a novel discriminative approach to parsing inspired by the large-margin criterion underlying support vector machines. Our formulation uses a factorization analogous to the standard dynamic programs for parsing. In particular, it allows one to efficiently learn a model which discriminates among the entire space of parse trees, as opposed to reranking the top few candidates. Our models...

متن کامل

Stochastic modeling of RNA pseudoknotted structures: a grammatical approach

MOTIVATION Modeling RNA pseudoknotted structures remains challenging. Methods have previously been developed to model RNA stem-loops successfully using stochastic context-free grammars (SCFG) adapted from computational linguistics; however, the additional complexity of pseudoknots has made modeling them more difficult. Formally a context-sensitive grammar is required, which would impose a large...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008